Overview

Dataset statistics

Number of variables15
Number of observations99003
Missing cells0
Missing cells (%)0.0%
Duplicate rows6
Duplicate rows (%)< 0.1%
Total size in memory6.8 MiB
Average record size in memory72.0 B

Variable types

Numeric12
DateTime1
Categorical2

Alerts

Dataset has 6 (< 0.1%) duplicate rowsDuplicates
tenure has a high cardinality: 2427 distinct values High cardinality
age is highly correlated with dob_yearHigh correlation
dob_year is highly correlated with ageHigh correlation
friend_count is highly correlated with friendships_initiatedHigh correlation
friendships_initiated is highly correlated with friend_countHigh correlation
likes is highly correlated with mobile_likes and 1 other fieldsHigh correlation
likes_received is highly correlated with mobile_likes_received and 1 other fieldsHigh correlation
mobile_likes is highly correlated with likesHigh correlation
mobile_likes_received is highly correlated with likes_received and 1 other fieldsHigh correlation
www_likes is highly correlated with likesHigh correlation
www_likes_received is highly correlated with likes_received and 1 other fieldsHigh correlation
likes_received is highly skewed (γ1 = 112.0745682) Skewed
mobile_likes_received is highly skewed (γ1 = 107.5312999) Skewed
www_likes_received is highly skewed (γ1 = 126.257317) Skewed
friend_count has 1962 (2.0%) zeros Zeros
friendships_initiated has 2997 (3.0%) zeros Zeros
likes has 22308 (22.5%) zeros Zeros
likes_received has 24428 (24.7%) zeros Zeros
mobile_likes has 35056 (35.4%) zeros Zeros
mobile_likes_received has 30003 (30.3%) zeros Zeros
www_likes has 60999 (61.6%) zeros Zeros
www_likes_received has 36864 (37.2%) zeros Zeros

Reproduction

Analysis started2022-11-01 12:49:07.442647
Analysis finished2022-11-01 12:49:32.224069
Duration24.78 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct101
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.28022383
Minimum13
Maximum113
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum13
5-th percentile15
Q120
median28
Q350
95-th percentile90
Maximum113
Range100
Interquartile range (IQR)30

Descriptive statistics

Standard deviation22.58974831
Coefficient of variation (CV)0.6059445462
Kurtosis1.561446767
Mean37.28022383
Median Absolute Deviation (MAD)10
Skewness1.415260654
Sum3690854
Variance510.2967289
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
185196
 
5.2%
234404
 
4.4%
194391
 
4.4%
203769
 
3.8%
213671
 
3.7%
253641
 
3.7%
173283
 
3.3%
163086
 
3.1%
223032
 
3.1%
242827
 
2.9%
Other values (91)61703
62.3%
ValueCountFrequency (%)
13484
 
0.5%
141925
 
1.9%
152618
2.6%
163086
3.1%
173283
3.3%
185196
5.2%
194391
4.4%
203769
3.8%
213671
3.7%
223032
3.1%
ValueCountFrequency (%)
113202
 
0.2%
11218
 
< 0.1%
11118
 
< 0.1%
11015
 
< 0.1%
1099
 
< 0.1%
1081661
1.7%
10798
 
0.1%
106125
 
0.1%
10580
 
0.1%
10473
 
0.1%
Distinct23151
Distinct (%)23.4%
Missing0
Missing (%)0.0%
Memory size773.6 KiB
Minimum1900-01-01 00:00:00
Maximum2000-10-27 00:00:00
Histogram with fixed size bins (bins=50)

dob_day
Real number (ℝ≥0)

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.53040817
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum1
5-th percentile1
Q17
median14
Q322
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation9.015606359
Coefficient of variation (CV)0.6204647697
Kurtosis-1.188960111
Mean14.53040817
Median Absolute Deviation (MAD)8
Skewness0.1078407568
Sum1438554
Variance81.28115802
MonotonicityNot monotonic
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
17900
 
8.0%
104030
 
4.1%
153555
 
3.6%
53545
 
3.6%
123413
 
3.4%
23409
 
3.4%
33291
 
3.3%
173266
 
3.3%
203263
 
3.3%
143219
 
3.3%
Other values (21)60112
60.7%
ValueCountFrequency (%)
17900
8.0%
23409
3.4%
33291
3.3%
43217
3.2%
53545
3.6%
63108
 
3.1%
73010
 
3.0%
83202
3.2%
93003
 
3.0%
104030
4.1%
ValueCountFrequency (%)
311507
1.5%
302530
2.6%
292508
2.5%
282955
3.0%
272755
2.8%
262753
2.8%
253217
3.2%
242807
2.8%
232864
2.9%
222838
2.9%

dob_year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct101
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1975.719776
Minimum1900
Maximum2000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum1900
5-th percentile1923
Q11963
median1985
Q31993
95-th percentile1998
Maximum2000
Range100
Interquartile range (IQR)30

Descriptive statistics

Standard deviation22.58974831
Coefficient of variation (CV)0.01143368032
Kurtosis1.561446767
Mean1975.719776
Median Absolute Deviation (MAD)10
Skewness-1.415260654
Sum195602185
Variance510.2967289
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19955196
 
5.2%
19904404
 
4.4%
19944391
 
4.4%
19933769
 
3.8%
19923671
 
3.7%
19883641
 
3.7%
19963283
 
3.3%
19973086
 
3.1%
19913032
 
3.1%
19892827
 
2.9%
Other values (91)61703
62.3%
ValueCountFrequency (%)
1900202
 
0.2%
190118
 
< 0.1%
190218
 
< 0.1%
190315
 
< 0.1%
19049
 
< 0.1%
19051661
1.7%
190698
 
0.1%
1907125
 
0.1%
190880
 
0.1%
190973
 
0.1%
ValueCountFrequency (%)
2000484
 
0.5%
19991925
 
1.9%
19982618
2.6%
19973086
3.1%
19963283
3.3%
19955196
5.2%
19944391
4.4%
19933769
3.8%
19923671
3.7%
19913032
3.1%

dob_month
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.283365151
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.529671569
Coefficient of variation (CV)0.5617485987
Kurtosis-1.240397572
Mean6.283365151
Median Absolute Deviation (MAD)3
Skewness0.03129550742
Sum622072
Variance12.45858138
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
111772
11.9%
108476
8.6%
58271
8.4%
88266
8.3%
38110
8.2%
78021
8.1%
97939
8.0%
127894
8.0%
47810
7.9%
27632
7.7%
Other values (2)14812
15.0%
ValueCountFrequency (%)
111772
11.9%
27632
7.7%
38110
8.2%
47810
7.9%
58271
8.4%
67607
7.7%
78021
8.1%
88266
8.3%
97939
8.0%
108476
8.6%
ValueCountFrequency (%)
127894
8.0%
117205
7.3%
108476
8.6%
97939
8.0%
88266
8.3%
78021
8.1%
67607
7.7%
58271
8.4%
47810
7.9%
38110
8.2%

gender
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size773.6 KiB
male
58574 
female
40254 
NA
 
175

Length

Max length6
Median length4
Mean length4.809652233
Min length2

Characters and Unicode

Total characters476170
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmale
2nd rowfemale
3rd rowmale
4th rowfemale
5th rowmale

Common Values

ValueCountFrequency (%)
male58574
59.2%
female40254
40.7%
NA175
 
0.2%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
male58574
59.2%
female40254
40.7%
na175
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e139082
29.2%
m98828
20.8%
a98828
20.8%
l98828
20.8%
f40254
 
8.5%
N175
 
< 0.1%
A175
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter475820
99.9%
Uppercase Letter350
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e139082
29.2%
m98828
20.8%
a98828
20.8%
l98828
20.8%
f40254
 
8.5%
Uppercase Letter
ValueCountFrequency (%)
N175
50.0%
A175
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin476170
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e139082
29.2%
m98828
20.8%
a98828
20.8%
l98828
20.8%
f40254
 
8.5%
N175
 
< 0.1%
A175
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII476170
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e139082
29.2%
m98828
20.8%
a98828
20.8%
l98828
20.8%
f40254
 
8.5%
N175
 
< 0.1%
A175
 
< 0.1%

tenure
Categorical

HIGH CARDINALITY

Distinct2427
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size773.6 KiB
300
 
173
303
 
170
242
 
164
272
 
163
257
 
161
Other values (2422)
98172 

Length

Max length4
Median length3
Mean length3.031595002
Min length1

Characters and Unicode

Total characters300137
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique133 ?
Unique (%)0.1%

Sample

1st row266
2nd row6
3rd row13
4th row93
5th row82

Common Values

ValueCountFrequency (%)
300173
 
0.2%
303170
 
0.2%
242164
 
0.2%
272163
 
0.2%
257161
 
0.2%
297161
 
0.2%
280160
 
0.2%
285160
 
0.2%
278158
 
0.2%
284158
 
0.2%
Other values (2417)97375
98.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
300173
 
0.2%
303170
 
0.2%
242164
 
0.2%
272163
 
0.2%
257161
 
0.2%
297161
 
0.2%
280160
 
0.2%
285160
 
0.2%
278158
 
0.2%
284158
 
0.2%
Other values (2417)97375
98.4%

Most occurring characters

ValueCountFrequency (%)
146163
15.4%
236960
12.3%
334833
11.6%
433047
11.0%
530079
10.0%
627470
9.2%
724933
8.3%
823035
7.7%
921940
7.3%
021673
7.2%
Other values (2)4
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number300133
> 99.9%
Uppercase Letter4
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
146163
15.4%
236960
12.3%
334833
11.6%
433047
11.0%
530079
10.0%
627470
9.2%
724933
8.3%
823035
7.7%
921940
7.3%
021673
7.2%
Uppercase Letter
ValueCountFrequency (%)
N2
50.0%
A2
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common300133
> 99.9%
Latin4
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
146163
15.4%
236960
12.3%
334833
11.6%
433047
11.0%
530079
10.0%
627470
9.2%
724933
8.3%
823035
7.7%
921940
7.3%
021673
7.2%
Latin
ValueCountFrequency (%)
N2
50.0%
A2
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII300137
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
146163
15.4%
236960
12.3%
334833
11.6%
433047
11.0%
530079
10.0%
627470
9.2%
724933
8.3%
823035
7.7%
921940
7.3%
021673
7.2%
Other values (2)4
 
< 0.1%

friend_count
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct2562
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean196.3507873
Minimum0
Maximum4923
Zeros1962
Zeros (%)2.0%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum0
5-th percentile3
Q131
median82
Q3206
95-th percentile720
Maximum4923
Range4923
Interquartile range (IQR)175

Descriptive statistics

Standard deviation387.304229
Coefficient of variation (CV)1.972511719
Kurtosis50.09427289
Mean196.3507873
Median Absolute Deviation (MAD)64
Skewness6.059008484
Sum19439317
Variance150004.5658
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01962
 
2.0%
11816
 
1.8%
21117
 
1.1%
3860
 
0.9%
5789
 
0.8%
4749
 
0.8%
10737
 
0.7%
24732
 
0.7%
6720
 
0.7%
29719
 
0.7%
Other values (2552)88802
89.7%
ValueCountFrequency (%)
01962
2.0%
11816
1.8%
21117
1.1%
3860
0.9%
4749
 
0.8%
5789
0.8%
6720
 
0.7%
7671
 
0.7%
8718
 
0.7%
9700
 
0.7%
ValueCountFrequency (%)
49231
< 0.1%
49171
< 0.1%
48631
< 0.1%
48451
< 0.1%
48441
< 0.1%
48261
< 0.1%
48171
< 0.1%
48031
< 0.1%
47971
< 0.1%
47941
< 0.1%

friendships_initiated
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct1519
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean107.4524711
Minimum0
Maximum4144
Zeros2997
Zeros (%)3.0%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum0
5-th percentile1
Q117
median46
Q3117
95-th percentile418
Maximum4144
Range4144
Interquartile range (IQR)100

Descriptive statistics

Standard deviation188.786951
Coefficient of variation (CV)1.756934475
Kurtosis42.53560096
Mean107.4524711
Median Absolute Deviation (MAD)36
Skewness5.150757415
Sum10638117
Variance35640.51287
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02997
 
3.0%
12212
 
2.2%
21551
 
1.6%
31355
 
1.4%
41352
 
1.4%
61328
 
1.3%
51328
 
1.3%
111319
 
1.3%
81314
 
1.3%
131279
 
1.3%
Other values (1509)82968
83.8%
ValueCountFrequency (%)
02997
3.0%
12212
2.2%
21551
1.6%
31355
1.4%
41352
1.4%
51328
1.3%
61328
1.3%
71237
1.2%
81314
1.3%
91245
1.3%
ValueCountFrequency (%)
41441
< 0.1%
36541
< 0.1%
35941
< 0.1%
35381
< 0.1%
34151
< 0.1%
32381
< 0.1%
32331
< 0.1%
30861
< 0.1%
30781
< 0.1%
30241
< 0.1%

likes
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct2924
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean156.0787855
Minimum0
Maximum25111
Zeros22308
Zeros (%)22.5%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median11
Q381
95-th percentile726
Maximum25111
Range25111
Interquartile range (IQR)80

Descriptive statistics

Standard deviation572.2806808
Coefficient of variation (CV)3.666614134
Kurtosis200.4456878
Mean156.0787855
Median Absolute Deviation (MAD)11
Skewness11.02370356
Sum15452268
Variance327505.1777
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
022308
22.5%
16928
 
7.0%
24434
 
4.5%
33240
 
3.3%
42507
 
2.5%
52027
 
2.0%
61806
 
1.8%
71618
 
1.6%
81430
 
1.4%
91381
 
1.4%
Other values (2914)51324
51.8%
ValueCountFrequency (%)
022308
22.5%
16928
 
7.0%
24434
 
4.5%
33240
 
3.3%
42507
 
2.5%
52027
 
2.0%
61806
 
1.8%
71618
 
1.6%
81430
 
1.4%
91381
 
1.4%
ValueCountFrequency (%)
251111
< 0.1%
216521
< 0.1%
167321
< 0.1%
165831
< 0.1%
147991
< 0.1%
143551
< 0.1%
140501
< 0.1%
140391
< 0.1%
136921
< 0.1%
136221
< 0.1%

likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct2681
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean142.6893629
Minimum0
Maximum261197
Zeros24428
Zeros (%)24.7%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median8
Q359
95-th percentile561
Maximum261197
Range261197
Interquartile range (IQR)58

Descriptive statistics

Standard deviation1387.919613
Coefficient of variation (CV)9.726861091
Kurtosis17384.94
Mean142.6893629
Median Absolute Deviation (MAD)8
Skewness112.0745682
Sum14126675
Variance1926320.851
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
024428
24.7%
17305
 
7.4%
24541
 
4.6%
33347
 
3.4%
42669
 
2.7%
52373
 
2.4%
61873
 
1.9%
71680
 
1.7%
81538
 
1.6%
91351
 
1.4%
Other values (2671)47898
48.4%
ValueCountFrequency (%)
024428
24.7%
17305
 
7.4%
24541
 
4.6%
33347
 
3.4%
42669
 
2.7%
52373
 
2.4%
61873
 
1.9%
71680
 
1.7%
81538
 
1.6%
91351
 
1.4%
ValueCountFrequency (%)
2611971
< 0.1%
1781661
< 0.1%
1520141
< 0.1%
1060251
< 0.1%
826231
< 0.1%
535341
< 0.1%
529641
< 0.1%
456331
< 0.1%
424491
< 0.1%
395361
< 0.1%

mobile_likes
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct2396
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean106.1162995
Minimum0
Maximum25111
Zeros35056
Zeros (%)35.4%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q346
95-th percentile481.9
Maximum25111
Range25111
Interquartile range (IQR)46

Descriptive statistics

Standard deviation445.2529851
Coefficient of variation (CV)4.195896268
Kurtosis360.9885806
Mean106.1162995
Median Absolute Deviation (MAD)4
Skewness14.16123656
Sum10505832
Variance198250.2207
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
035056
35.4%
16297
 
6.4%
23941
 
4.0%
32917
 
2.9%
42265
 
2.3%
51794
 
1.8%
61598
 
1.6%
71395
 
1.4%
81212
 
1.2%
91149
 
1.2%
Other values (2386)41379
41.8%
ValueCountFrequency (%)
035056
35.4%
16297
 
6.4%
23941
 
4.0%
32917
 
2.9%
42265
 
2.3%
51794
 
1.8%
61598
 
1.6%
71395
 
1.4%
81212
 
1.2%
91149
 
1.2%
ValueCountFrequency (%)
251111
< 0.1%
216521
< 0.1%
167321
< 0.1%
140391
< 0.1%
135291
< 0.1%
129341
< 0.1%
126391
< 0.1%
121041
< 0.1%
120831
< 0.1%
119591
< 0.1%

mobile_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct2004
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84.1204913
Minimum0
Maximum138561
Zeros30003
Zeros (%)30.3%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q333
95-th percentile317
Maximum138561
Range138561
Interquartile range (IQR)33

Descriptive statistics

Standard deviation839.8894437
Coefficient of variation (CV)9.984362083
Kurtosis15522.64932
Mean84.1204913
Median Absolute Deviation (MAD)4
Skewness107.5312999
Sum8328181
Variance705414.2777
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
030003
30.3%
18243
 
8.3%
24948
 
5.0%
33608
 
3.6%
42944
 
3.0%
52383
 
2.4%
62022
 
2.0%
71745
 
1.8%
81521
 
1.5%
91437
 
1.5%
Other values (1994)40149
40.6%
ValueCountFrequency (%)
030003
30.3%
18243
 
8.3%
24948
 
5.0%
33608
 
3.6%
42944
 
3.0%
52383
 
2.4%
62022
 
2.0%
71745
 
1.8%
81521
 
1.5%
91437
 
1.5%
ValueCountFrequency (%)
1385611
< 0.1%
1312441
< 0.1%
899111
< 0.1%
733331
< 0.1%
434101
< 0.1%
307541
< 0.1%
303871
< 0.1%
273531
< 0.1%
207701
< 0.1%
189251
< 0.1%

www_likes
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct1726
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.96242538
Minimum0
Maximum14865
Zeros60999
Zeros (%)61.6%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q37
95-th percentile208
Maximum14865
Range14865
Interquartile range (IQR)7

Descriptive statistics

Standard deviation285.5601519
Coefficient of variation (CV)5.715498191
Kurtosis449.1484832
Mean49.96242538
Median Absolute Deviation (MAD)0
Skewness16.91102529
Sum4946430
Variance81544.60033
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
060999
61.6%
14697
 
4.7%
22760
 
2.8%
31948
 
2.0%
41419
 
1.4%
51202
 
1.2%
61081
 
1.1%
7897
 
0.9%
8792
 
0.8%
9757
 
0.8%
Other values (1716)22451
 
22.7%
ValueCountFrequency (%)
060999
61.6%
14697
 
4.7%
22760
 
2.8%
31948
 
2.0%
41419
 
1.4%
51202
 
1.2%
61081
 
1.1%
7897
 
0.9%
8792
 
0.8%
9757
 
0.8%
ValueCountFrequency (%)
148651
< 0.1%
129031
< 0.1%
110771
< 0.1%
107631
< 0.1%
106271
< 0.1%
105391
< 0.1%
102551
< 0.1%
102321
< 0.1%
99021
< 0.1%
94311
< 0.1%

www_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct1636
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58.56883125
Minimum0
Maximum129953
Zeros36864
Zeros (%)37.2%
Negative0
Negative (%)0.0%
Memory size386.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median2
Q320
95-th percentile227
Maximum129953
Range129953
Interquartile range (IQR)20

Descriptive statistics

Standard deviation601.416348
Coefficient of variation (CV)10.26853934
Kurtosis23812.2491
Mean58.56883125
Median Absolute Deviation (MAD)2
Skewness126.257317
Sum5798490
Variance361701.6237
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
036864
37.2%
18513
 
8.6%
25111
 
5.2%
33586
 
3.6%
42828
 
2.9%
52317
 
2.3%
61918
 
1.9%
71602
 
1.6%
81445
 
1.5%
91373
 
1.4%
Other values (1626)33446
33.8%
ValueCountFrequency (%)
036864
37.2%
18513
 
8.6%
25111
 
5.2%
33586
 
3.6%
42828
 
2.9%
52317
 
2.3%
61918
 
1.9%
71602
 
1.6%
81445
 
1.5%
91373
 
1.4%
ValueCountFrequency (%)
1299531
< 0.1%
621031
< 0.1%
396051
< 0.1%
392131
< 0.1%
340391
< 0.1%
326921
< 0.1%
293371
< 0.1%
231471
< 0.1%
226441
< 0.1%
150961
< 0.1%

Interactions

Correlations

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

agedate_of_birthdob_daydob_yeardob_monthgendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_received
0141999-11-1919199911male26600000000
1141999-11-022199911female600000000
2141999-11-1616199911male1300000000
3141999-12-2525199912female9300000000
4141999-12-044199912male8200000000
5141999-12-011199912male1500000000
6132000-01-141420001male1200000000
7132000-01-04420001female000000000
8132000-01-01120001male8100000000
9132000-02-02220002male17100000000

Last rows

agedate_of_birthdob_daydob_yeardob_monthgendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_received
98993191994-08-151519948male3944538414445011508844355961669127
98994201993-01-04419931female4021988332735110602572487333310332692
98995201993-10-099199310female6993611973450777684414690993859
98996241989-04-252519894female182293812726018177655843117081756057
98997281985-12-1414198512female29022181618462610268429042503366018
98998681945-04-04419454female54121183413996180893505118874916202
98999181995-03-121219953female211968172044011341243991059222820
99000151998-05-101019985female111200215241195912554119591146201092
99001231990-04-111119904female416256018545066516450657600756
99002391974-05-151519745female39720497689410124439410953002913

Duplicate rows

Most frequently occurring

agedate_of_birthdob_daydob_yeardob_monthgendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_received# duplicates
2251988-01-01119881male21000000004
0231990-01-01119901male507110000002
1251988-01-01119881male0000000002
3251988-01-01119881male33000000002
4261987-01-01119871male500220000002
5331980-01-01119801male17000000002